Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

نویسندگان

Ramprasaath R. Selvaraju

Abhishek Das

Ramakrishna Vedantam

Michael Cogswell

Devi Parikh

Dhruv Batra

چکیده

We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for ‘dog’ or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multi-modal inputs (e.g. VQA) or reinforcement learning, without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) are robust to adversarial images, (c) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that GradCAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one. Our code is available at https://github.com/ramprs/grad-cam/ and a demo is available on CloudCV [2]1. Video of the demo can be found at youtu.be/COjUB9Izk6E. 1http://gradcam.cloudcv.org

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks

Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, deep models are perceived as ”black box” methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction....

متن کامل

Grad-CAM: Why did you say that?

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are ‘important’ for predictions – producing visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixe...

متن کامل

Dream Formulations and Deep Neural Networks: Humanistic Themes in the Iconology of the Machine-Learned Image

This paper addresses the interpretability of deep learning-enabled image recognition processes in computer vision science in relation to theories in art history and cognitive psychology on the vision-related perceptual capabilities of humans. Examination of what is determinable about the machine-learned image in comparison to humanistic theories of visual perception, particularly in regard to a...

متن کامل

Visual Explanations from Hadamard Product in Multimodal Deep Networks

The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function ...

متن کامل

Deep Adaptive Networks for Visual Data Classification

This paper proposes a classifier called deep adaptive networks (DAN) based on deep belief networks (DBN) for visual data classification. First, we construct a directed deep belief nets by using a set of Restricted Boltzmann Machines (RBM) and a Gaussian RBM via greedy and layerwise unsupervised learning. Then, we refine the parameter space of the deep architecture to adapt the classification re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1610.02391 شماره

صفحات -

تاریخ انتشار 2016

Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

نویسندگان

چکیده

منابع مشابه

Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks

Grad-CAM: Why did you say that?

Dream Formulations and Deep Neural Networks: Humanistic Themes in the Iconology of the Machine-Learned Image

Visual Explanations from Hadamard Product in Multimodal Deep Networks

Deep Adaptive Networks for Visual Data Classification

عنوان ژورنال:

اشتراک گذاری